In [37]:
pricing = get_pricing(['AAPL', 'MSFT'],
start_date='2014-01-01',
end_date='2014-01-07',
frequency='minute',
fields='price')
pricing
is a DataFrame
with the same structure as the return value of history
on quantopian.
In [38]:
pricing.head(10)
Out[38]:
Pandas' built-in groupby
and apply
operations are extremely powerful. For more information on these features, see http://pandas.pydata.org/pandas-docs/stable/groupby.html.
In [39]:
from pandas.tseries.tools import normalize_date
def my_grouper(ts):
"Function to apply to the index of the DataFrame to break it into groups."
# Returns midnight of the supplied date.
return normalize_date(ts)
def first_thirty_minutes(frame):
"Function to apply to the resulting groups."
return frame.iloc[:30]
The result of a groupby
computation is a Hierarchichally-Indexed DataFrame where the outermost layer of the index is the groupby key, and the secondary layers are the values from the frame's original index.
In [40]:
data = pricing.groupby(my_grouper).apply(first_thirty_minutes)
data.head(40)
Out[40]:
Because our DataFrame
is Hierarchically-Indexed, we can query it by our groupby keys.
In [41]:
from pandas import Timestamp
# This gives us the first thirty minutes of January 3rd.
data.loc[Timestamp('2014-01-03', tz='UTC')]
Out[41]:
If we want to query on the second layer of the index, we have to use .xs
with a level argument instead of .loc
.
Note that level=1
means the second level of the index, because the levels start at index 0.
In [42]:
data.xs(Timestamp('2014-01-03 14:58:00', tz='UTC'), level=1)
Out[42]:
If we just want to work with the original index values, we can drop the extra level from our index.
In [43]:
data_copy = data.copy()
data_copy.index = data_copy.index.droplevel(0)
data_copy.head()
Out[43]: